Skip to content

fix: add thread safety and input validation#10

Merged
krisoye13 merged 1 commit intomainfrom
fix/thread-safety-and-validation
Feb 2, 2026
Merged

fix: add thread safety and input validation#10
krisoye13 merged 1 commit intomainfrom
fix/thread-safety-and-validation

Conversation

@krisoye
Copy link
Copy Markdown
Owner

@krisoye krisoye commented Feb 2, 2026

Summary

This PR addresses thread safety issues and adds input validation based on the QA review findings from PR #9.

Thread Safety Fixes

Cache Module (src/document_analysis_mcp/cache/__init__.py):

  • Added fcntl file locking for cache metadata read/write operations
  • Added threading.Lock for in-memory metadata protection
  • Implemented atomic writes using temp file + rename pattern
  • Created _remove_entry_unlocked() for use when lock is already held

Tracking Module (src/document_analysis_mcp/tracking/__init__.py):

  • Added fcntl exclusive lock for JSON Lines append operations
  • Added shared lock for read operations to prevent read during write

Input Validation

OCR Tool (src/document_analysis_mcp/tools/ocr.py):

  • Added VALID_LANGUAGES frozenset with 28 common Tesseract language codes
  • Added _validate_language() function that logs warning and returns default for invalid codes
  • Language validation is applied before OCR processing begins

Testing

Added comprehensive concurrent operation tests:

  • TestCacheThreadSafety: 4 tests for concurrent puts, gets, mixed operations, and cleanup
  • TestTrackerThreadSafety: 3 tests for concurrent records, mixed read/write, and line integrity
  • TestLanguageValidation: 6 tests for language validation behavior

Changes

  • src/document_analysis_mcp/cache/__init__.py - Thread safety with file and memory locks
  • src/document_analysis_mcp/tracking/__init__.py - File locking for JSONL operations
  • src/document_analysis_mcp/tools/ocr.py - Language validation
  • tests/test_cache.py - Concurrent operation tests
  • tests/test_tracking.py - Concurrent operation tests
  • tests/test_ocr.py - Language validation tests

Testing

All 223 tests pass:

pytest tests/ -v
============================= 223 passed in 6.31s ==============================

Linting passes:

ruff check src/ tests/
ruff format --check src/ tests/

Notes

  • Uses fcntl which is Unix-only. If cross-platform support is needed, consider portalocker package.
  • The atomic write pattern (temp file + rename) prevents partial writes from corrupting metadata.

🤖 Generated with Claude Code

- Add file locking (fcntl) to cache metadata operations for concurrent access
- Add threading.Lock for in-memory cache metadata protection
- Add file locking to usage tracking append and read operations
- Add language parameter validation for OCR tool with VALID_LANGUAGES set
- Add atomic metadata writes using temp file + rename pattern
- Add comprehensive concurrent operation tests for cache and tracking

Thread safety improvements:
- Cache: _save_metadata() uses exclusive lock with atomic write
- Cache: _load_metadata() uses shared lock for concurrent reads
- Cache: All metadata modifications protected by threading.Lock
- Tracking: record() uses exclusive lock for append operations
- Tracking: get_records() uses shared lock for read operations

Input validation:
- OCR: Invalid language codes log warning and fall back to "eng"
- OCR: VALID_LANGUAGES includes 28 common Tesseract language codes

Fixes issues identified in QA review of PR #9

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@krisoye13 krisoye13 merged commit 7018024 into main Feb 2, 2026
4 checks passed
@krisoye krisoye deleted the fix/thread-safety-and-validation branch February 6, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants